Correlation Coefficient Based Average Textual Similarity Model for Information Retrieval System in Wide Area Networks
نویسندگان
چکیده
In wide area networks, retrieving the relevant text is a challenging task for information retrieval because most of the information requests are text based. The focus of paper is on the similarity measurement, performance evaluation and design of information retrieval techniques using the four similarity functions i.e. Jaccard, Cosine, Dice and Overlap. The performance evaluation of these similarity functions has been done for the similarity between the documents retrieved by the search engine for the entered text using the vector space model. The correlation coefficient was applied for evaluating the performance of similarity functions. All the possible combination of similarity functions have been explored and textual similarity model has been proposed for the information retrieval system in wide area networks.
منابع مشابه
A Study of Similarity Functions Used in Textual Information Retrieval in Wide Area Networks
World Wide Web is a rich source of information. It continues to expand in size and complexity with the increasing use of the internet and social media but how to retrieve relevant documents on the Web is becoming a challenge. In this paper there is discussion about the goals, challenges and importance of similarity functions in information retrieval in wide area networks. This paper discusses t...
متن کاملA New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation
Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...
متن کاملEfficient Agent-Based Dissemination of Textual Information
We study the problem of efficient dissemination of textual information over wide-area networks. Our dissemination architecture utilises middle-agents and sophisticated matching algorithms. The data model and query language is based on the well-known Boolean model from Information Retrieval. The main focus of this paper is the problem of matching incoming documents with submitted user profiles. ...
متن کاملLoad-Frequency Control: a GA based Bayesian Networks Multi-agent System
Bayesian Networks (BN) provides a robust probabilistic method of reasoning under uncertainty. They have been successfully applied in a variety of real-world tasks but they have received little attention in the area of load-frequency control (LFC). In practice, LFC systems use proportional-integral controllers. However since these controllers are designed using a linear model, the nonlinearities...
متن کاملA Combined Matching Function based Evolutionary Approach for development of Adaptive Information Retrieval System
The growth in the volume of the Web and other textual repositories has made Information Retrieval task difficult, costly and in many cases very complex for the end user. In this context search engines became valuable tools to help users find content relevant to their information needs. However finding relevant information based on user's need is still a challenge. Naturally research on informat...
متن کامل